Goto

Collaborating Authors

 interaction energy


$Δ$-ML Ensembles for Selecting Quantum Chemistry Methods to Compute Intermolecular Interactions

Wallace, Austin M., Sherrill, C. David, Krishnan, Giri P.

arXiv.org Artificial Intelligence

Ab initio quantum chemical methods for accurately computing interactions between molecules have a wide range of applications but are often computationally expensive. Hence, selecting an appropriate method based on accuracy and computational cost remains a significant challenge due to varying performance of methods. In this work, we propose a framework based on an ensemble of $Δ$-ML models trained on features extracted from a pre-trained atom-pairwise neural network to predict the error of each method relative to all other methods including the ``gold standard'' coupled cluster with single, double, and perturbative triple excitations at the estimated complete basis set limit [CCSD(T)/CBS]. Our proposed approach provides error estimates across various levels of theories and identifies the computationally efficient approach for a given error range utilizing only a subset of the dataset. Further, this approach allows comparison between various theories. We demonstrate the effectiveness of our approach using an extended BioFragment dataset, which includes the interaction energies for common biomolecular fragments and small organic dimers. Our results show that the proposed framework achieves very small mean-absolute-errors below 0.1 kcal/mol regardless of the given method. Furthermore, by analyzing all-to-all $Δ$-ML models for present levels of theory, we identify method groupings that align with theoretical hypotheses, providing evidence that $Δ$-ML models can easily learn corrections from any level of theory to any other level of theory.


Smoothed Distance Kernels for MMDs and Applications in Wasserstein Gradient Flows

Rux, Nicolaj, Quellmalz, Michael, Steidl, Gabriele

arXiv.org Machine Learning

Negative distance kernels $K(x,y) := - \|x-y\|$ were used in the definition of maximum mean discrepancies (MMDs) in statistics and lead to favorable numerical results in various applications. In particular, so-called slicing techniques for handling high-dimensional kernel summations profit from the simple parameter-free structure of the distance kernel. However, due to its non-smoothness in $x=y$, most of the classical theoretical results, e.g. on Wasserstein gradient flows of the corresponding MMD functional do not longer hold true. In this paper, we propose a new kernel which keeps the favorable properties of the negative distance kernel as being conditionally positive definite of order one with a nearly linear increase towards infinity and a simple slicing structure, but is Lipschitz differentiable now. Our construction is based on a simple 1D smoothing procedure of the absolute value function followed by a Riemann-Liouville fractional integral transform. Numerical results demonstrate that the new kernel performs similarly well as the negative distance kernel in gradient descent methods, but now with theoretical guarantees.


Beyond Propagation of Chaos: A Stochastic Algorithm for Mean Field Optimization

Tankala, Chandan, Nagaraj, Dheeraj M., Raj, Anant

arXiv.org Machine Learning

Gradient flow in the 2-Wasserstein space is widely used to optimize functionals over probability distributions and is typically implemented using an interacting particle system with $n$ particles. Analyzing these algorithms requires showing (a) that the finite-particle system converges and/or (b) that the resultant empirical distribution of the particles closely approximates the optimal distribution (i.e., propagation of chaos). However, establishing efficient sufficient conditions can be challenging, as the finite particle system may produce heavily dependent random variables. In this work, we study the virtual particle stochastic approximation, originally introduced for Stein Variational Gradient Descent. This method can be viewed as a form of stochastic gradient descent in the Wasserstein space and can be implemented efficiently. In popular settings, we demonstrate that our algorithm's output converges to the optimal distribution under conditions similar to those for the infinite particle limit, and it produces i.i.d. samples without the need to explicitly establish propagation of chaos bounds.


OpenQDC: Open Quantum Data Commons

Gabellini, Cristian, Shenoy, Nikhil, Thaler, Stephan, Canturk, Semih, McNeela, Daniel, Beaini, Dominique, Bronstein, Michael, Tossou, Prudencio

arXiv.org Artificial Intelligence

Machine Learning Interatomic Potentials (MLIPs) are a highly promising alternative to force-fields for molecular dynamics (MD) simulations, offering precise and rapid energy and force calculations. However, Quantum-Mechanical (QM) datasets, crucial for MLIPs, are fragmented across various repositories, hindering accessibility and model development. We introduce the openQDC package, consolidating 37 QM datasets from over 250 quantum methods and 400 million geometries into a single, accessible resource. These datasets are meticulously preprocessed, and standardized for MLIP training, covering a wide range of chemical elements and interactions relevant in organic chemistry. OpenQDC includes tools for normalization and integration, easily accessible via Python. Experiments with well-known architectures like SchNet, TorchMD-Net, and DimeNet reveal challenges for those architectures and constitute a leaderboard to accelerate benchmarking and guide novel algorithms development. Continuously adding datasets to OpenQDC will democratize QM dataset access, foster more collaboration and innovation, enhance MLIP development, and support their adoption in the MD field.


AUGUR, A flexible and efficient optimization algorithm for identification of optimal adsorption sites

Kouroudis, Ioannis, Poonam, null, Misciaci, Neel, Mayr, Felix, Müller, Leon, Gu, Zhaosu, Gagliardi, Alessio

arXiv.org Artificial Intelligence

Novel, functional structures at the nanoscale could be crucial for transforming a broad spectrum of economically significant processes into greener and more sustainable solutions. For instance, nanostructured materials hold the potential to significantly enhance the cost-effectiveness of fuel-cell devices [1], enable the creation of highly efficient quantum-dot LEDs [2], and pave the way for generating atom-precise efficient nanocatalysts for studying novel catalytic pathways in electrochemical applications [3, 4]. As performance is highly dependent on specific structural characteristics which often can not easily be resolved in lab experiments, computational chemistry - most often by using Density Functional Theory (DFT) based approaches - can be used to generate in-silico insights. Typical questions range from elucidating which feature of a given nanoparticle might improve catalytic performance to mechanistic explanations for key synthesis procedures, allowing tailored experiments to drive up experimental yields for optimal structures. Commonly, these questions are associated with finding energetically favorable configurations for the potential energy surface (PES) of a system, which is a property relevant to solving a wide range of problems in computational chemistry. The established methodology allows finding "docking" mechanisms between small molecules and large biomolecules, which is relevant for drug development [5]. Additionally, a large area of research revolves around the sensing of harmful gases by novel nanomaterials chosen according to their strength of interactions.


The Open DAC 2023 Dataset and Challenges for Sorbent Discovery in Direct Air Capture

Sriram, Anuroop, Choi, Sihoon, Yu, Xiaohan, Brabson, Logan M., Das, Abhishek, Ulissi, Zachary, Uyttendaele, Matt, Medford, Andrew J., Sholl, David S.

arXiv.org Artificial Intelligence

New methods for carbon dioxide removal are urgently needed to combat global climate change. Direct air capture (DAC) is an emerging technology to capture carbon dioxide directly from ambient air. Metal-organic frameworks (MOFs) have been widely studied as potentially customizable adsorbents for DAC. However, discovering promising MOF sorbents for DAC is challenging because of the vast chemical space to explore and the need to understand materials as functions of humidity and temperature. We explore a computational approach benefiting from recent innovations in machine learning (ML) and present a dataset named Open DAC 2023 (ODAC23) consisting of more than 38M density functional theory (DFT) calculations on more than 8,400 MOF materials containing adsorbed $CO_2$ and/or $H_2O$. ODAC23 is by far the largest dataset of MOF adsorption calculations at the DFT level of accuracy currently available. In addition to probing properties of adsorbed molecules, the dataset is a rich source of information on structural relaxation of MOFs, which will be useful in many contexts beyond specific applications for DAC. A large number of MOFs with promising properties for DAC are identified directly in ODAC23. We also trained state-of-the-art ML models on this dataset to approximate calculations at the DFT level. This open-source dataset and our initial ML models will provide an important baseline for future efforts to identify MOFs for a wide range of applications, including DAC.


Deepmind Open-Sources DM21: A Deep Learning Model For Quantum Chemistry

#artificialintelligence

DM21 outperforms standard models on various benchmarks, and it's accessible as a PySCF simulation framework addition. In a paper published in Science, the model was detailed. The energy density functional component of Density Functional Theory (DFT), which describes the quantum mechanical behavior of molecules, is approximated by DM21 using a neural network. DM21 corrects systemic flaws in prior functional approximations, which failed to treat systems with "fractional electron character" appropriately. The model uses a multilayer perceptron (MLP) architecture with a grid of electron densities.


Multi-Fidelity Gaussian Process based Empirical Potential Development for Si:H Nanowires

Kim, Moonseop, Yin, Huayi, Lin, Guang

arXiv.org Machine Learning

In material modeling, the calculation speed using the empirical potentials is fast compared to the first principle calculations, but the results are not as accurate as of the first principle calculations. First principle calculations are accurate but slow and very expensive to calculate. In this work, first, the H-H binding energy and H$_2$-H$_2$ interaction energy are calculated using the first principle calculations which can be applied to the Tersoff empirical potential. Second, the H-H parameters are estimated. After fitting H-H parameters, the mechanical properties are obtained. Finally, to integrate both the low-fidelity empirical potential data and the data from the high-fidelity first-principle calculations, the multi-fidelity Gaussian process regression is employed to predict the H-H binding energy and the H$_2$-H$_2$ interaction energy. Numerical results demonstrate the accuracy of the developed empirical potentials.


QM/ML: Datasets

#artificialintelligence

Water dimer (O-O distances 4.5 Å, geometries sampled from a 300 K MD using the AMOEBA forcefield), interaction energies and forces with counterpoise correction, using MP2 / AVDZ, AVTZ, AVQZ (10k, 10k, 1k configurations, respectively).